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Pairwise Constraint Propagation on Multi-View Data 

Zhiwu Lu and Liwei Wang 


Abstract —This paper presents a graph-based learning ap¬ 
proach to pairwise constraint propagation on multi-view data. 
Although pairwise constraint propagation has been studied exten¬ 
sively, pairwise constraints are usually defined over pairs of data 
points from a single view, i.e., only intra-view constraint propaga¬ 
tion is considered for multi-view tasks. In fact, very little attention 
has been paid to inter-view constraint propagation, which is more 
challenging since pairwise constraints are now defined over pairs 
of data points from different views. In this paper, we propose 
to decompose the challenging inter-view constraint propagation 
problem into semi-supervised learning subproblems so that they 
can be efficiently solved based on graph-based label propagation. 
To the best of our knowledge, this is the first attempt to give 
an efficient solution to inter-view constraint propagation from a 
semi-supervised learning viewpoint. Moreover, since graph-based 
label propagation has been adopted for basic optimization, we 
develop two constrained graph construction methods for inter¬ 
view constraint propagation, which only differ in how the intra¬ 
view pairwise constraints are exploited. The experimental results 
in cross-view retrieval have shown the promising performance of 
our inter-view constraint propagation. 

Index Terms —Pairwise constraint propagation, multi-view 
data, label propagation, graph construction, cross-view retrieval 


I. Introduction 

As an alternative type of supervisory information easier to 
access than the class labels of data points, pairwise constraints 
are widely used for different machine learning tasks in the 
literature. To effectively exploit pairwise constraints for clus¬ 
tering or classification much attention has been paid 

to pairwise constraint propagation 0-0. Different from the 
method 0 which only adjusts the similarities between con¬ 
strained data points, these approaches can propagate pairwise 
constraints to other similarities between unconstrained data 
points and thus achieve better results in most cases. More 
importantly, given that each pairwise constraint is actually 
defined over a pair of data points from a single view, these 
approaches can all be regarded as intra-view constraint prop¬ 
agation when multi-view data is concerned. Since we have to 
learn the relationships (must-link or cannot-link) between data 
points, intra-view constraint propagation is more challenging 
than the traditional label propagation If9l- lfl4l whose goal is 
only to predict the labels of unlabeled data points. 

However, besides intra-view pairwise constraints, we may 
also have easy access to inter-view pairwise constraints in 
multi-view tasks such as cross-view retrieval E), where 
each pairwise constraint is defined over a pair of data points 
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from different views (see Fig. |T]). In this case, inter-view 
pairwise constraints still specify the must-link or cannot- 
link relationships between data points. Since the similarity of 
two data points from different views is commonly unknown 
in practice, inter-view constraint propagation is significantly 
more challenging than intra-view constraint propagation. In 
fact, very little attention has been paid to inter-view constraint 
propagation for multi-view tasks in the literature. Although 
pairwise constraint propagation has been successfully applied 
to multi-view clustering in IT6ft , fTTIl . only intra-view pairwise 
constraints are propagated across different views. Here, it 
should be noted that these two constraint propagation methods 
have actually ignored the concept of inter-view pairwise 
constraints or the strategy of inter-view constraint propagation. 

Since multi-view data can be readily decomposed into a 
series of two-view data, we focus on inter-view constraint 
propagation only across two views in this paper. However, such 
inter-view constraint propagation remains a rather challenging 
task. Fortunately, from a semi-supervised learning viewpoint, 
we can formulate inter-view constraint propagation as mini¬ 
mizing a regularized energy functional. Specifically, we first 
decompose the inter-view constraint propagation problem into 
a set of independent semi-supervised learning [t9l- lfl2ll sub¬ 
problems. Through formulating these subproblems uniformly 
as minimizing a regularized energy functional, we thus develop 
an efficient algorithm for inter-view constraint propagation 
based on the traditional graph-based label propagation tech¬ 
nique 0 . In summary, we succeed in giving an insightful 
explanation of inter-view constraint propagation from a graph- 
based semi-supervised learning viewpoint. 

However, since graph-based label propagation has been 
adopted for basic optimization, there remains one problem to 
be concerned in inter-view constraint propagation, i.e., how to 
exploit intra-view pairwise constraints for graph construction 
within each view. In this paper, we develop two constrained 
graph construction methods for inter-view constraint prop¬ 
agation, which only differ in how the intra-view pairwise 
constraints are exploited. The first method limits our inter¬ 
view constraint propagation to a single view and then utilize 
the constraint propagation results to adjust the weight matrix 
of each view, while the second method formulates graph 
construction as sparse representation and then directly add the 
intra-view pairwise constraints into sparse representation. 

The flowchart of our inter-view constraint propagation with 
constrained graph construction is illustrated in Fig. \T\ where 
only two views (i.e. text and image) are considered. It should 
be noted that, when multiple views refer to text, image, audio 
and so on, the output of our inter-view constraint propagation 
actually denotes the correlation between different media views. 
That is, the proposed algorithm can be directly used for 
cross-view retrieval (also see examples in Fig. 0 which has 
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Fig. 1. Illustration of the flowchart of inter-view constraint propagation (Inter- 
CP) with constrained graph construction (CGC). Here, we only consider two 
different views: text and image. Moreover, Intra-PCs and Inter-PCs denote 
intra-view and inter-view pairwise constraints, respectively. 

drawn much attention recently 115 ]. For cross-view retrieval, 
it is not feasible to combine multiple views just as previous 
multi-view retrieval methods jT8), ED- More notably, the two 
closely related methods lfT6l , lfT7l for multi-view clustering are 
actually incompetent for cross-view retrieval. 

Finally, to emphasize our main contributions, we summarize 
the following distinct advantages of our pairwise constraint 
propagation on multi-view data: 

• We have made the first attempt to give an efficient so¬ 
lution to inter-view constraint propagation from a graph- 
based semi-supervised learning viewpoint. 

• We have developed two constrained graph construction 
methods so that the intra-view pairwise constraints can 
also be exploited for inter-view constraint propagation. 

• When applied to cross-view retrieval, our inter-view con¬ 
straint propagation has been shown to achieve promising 
results with respect to the state-of-the-art. 

• Although only evaluated in cross-view retrieval, our inter¬ 
view constraint propagation can be readily extended to 
many other multi-view tasks. 

The remainder of this paper is organized as follows. In 
Section mi we formulate inter-view constraint propagation 
from a semi-supervised learning viewpoint. In Section Jill 
we develop two constrained graph construction methods for 
our inter-view constraint propagation. In section JV1 our inter¬ 
view constraint propagation is applied to cross-view retrieval. 
Finally, Sections [V] and [VT] provide the experimental results 
and conclusions, respectively. 

II. Inter-View Constraint Propagation 

In this section, we first formulate inter-view constraint prop¬ 
agation as minimizing a regularized energy functional from a 
semi-supervised learning viewpoint. Furthermore, we develop 
an efficient algorithm for inter-view constraint propagation 
based on the label propagation technique 0. 

A. Problem Formulation 

Given a set of inter-view pairwise constraints defined over 
pairs of data points from different views, the goal of inter-view 
constraint propagation is to learn the cross-view relationships 
from these initial pairwise constraints. Since the similarity of 
two data points from different views is unknown in practice, 
inter-view constraint propagation on multi-view data is much 


more challenging than the traditional pairwise constraint prop¬ 
agation over a single view. Considering that this multi-view 
problem can be readily decomposed into a series of two-view 
subproblems, we focus on inter-view constraint propagation 
on two-view data in the following. 

Let {X, y} be a two-view dataset, where X = {aq,..., xn} 
and y = { 2 / 1 , ..., 2 /m}- It should be noted that we may have 
N ^ M. As an example, a two-view dataset is shown in 
Fig. CD with image and text being the two different views. For 
the two-view dataset {X, 3^}, we can define a set of initial 
must-link constraints as Ad = { (xi , yj ) : l(xi) = l(yj)} 
and a set of initial cannot-link constraints as C = { (pci , yj ) : 
l(xi) 7 ^ Z(i/j)}, where l{xi) (or /(%)) is the class label of 
Xi G X (or yj G y). Here, the two data points xi and yj are 
assumed to share the same class label set. If the class labels 
are not provided, the inter-view pairwise constraints can be 
defined only based on the correspondence between two views, 
which can be readily obtained from Web-based content (e.g. 
Wikipedia articles). Several examples of inter-view pairwise 
constraints are illustrated in Fig. [T] 

We can now state that the goal of inter-view constraint 
propagation is to propagate the two sets of initial pairwise 
constraints M and C across both A' and y. In fact, this is 
equivalent to deriving the best solution F* G T from both M 
and C, with T = {F = {fij} nxm}- Here, any exhaustive 
set of inter-view pairwise constraints is denoted as f G J, 
where fij > 0 means ( 07 , yj ) is a must-link constraint while 
fij < 0 means ( 27 , 2 /j) is a cannot-link constraint, with | fij\ 
denoting the confidence score of ( 27 , yj ) being a must-link (or 
cannot-link) constraint. Hence, T can actually be regarded as 
the feasible solution set of inter-view constraint propagation. 

Although it is difficult to directly find the best solution F* G 
T to inter-view constraint propagation, we can tackle this chal¬ 
lenging problem by decomposing it into a set of independent 
semi-supervised learning subproblems. More concretely, we 
first denote the two sets of initial pairwise constraints M and 
C with a single matrix Z = {z^nxm'- 


z ij 


+1? ( x i 1 Vj ) £ -M; 
< — 1, ( Xi,yj) G C ; 

0, otherwise. 


(i) 


Moreover, by making vertical and horizontal observations on 
such initial matrix Z, we decompose the inter-view constraint 
propagation problem into independent semi-supervised learn¬ 
ing subproblems, which is also illustrated in Fig. [2 Finally, 
given two graphs Qx = {X,Wx } and Gy = {y,Wy} 
constructed over {X,y} with Wx (or Wy) being the edge 
weight matrix defined over the vertex set X (or y ), we utilize 
the graph-based label propagation method 0 to uniformly 
solve these semi-supervised learning subproblems: 


^Fy ~ Z W 2 f r ° + T ( F x£xF X ) + \\Fy - Z\\ 2 fro 

+^ytr(FyCyFy) + 'y|| F X - Fy\\ 2 fro , (2) 


where fix > 0 (fly > 0, or 7 > 0) denotes the regularization 
parameter, Cx (or Cy) denotes the normalized Laplacian 
matrix defined over (or y), || • \ \f ro denotes the Frobenius 
norm of a matrix, and tr(-) denotes the trace of a matrix. 
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Fig. 2. Illustration of the initial matrix Z. When we focus on a single pair 
of data points, e.g. (X3, yf) here, the inter-view constraint propagation can 
be viewed as a two-class semi-supervised learning problem (in name only) in 
both vertical and horizontal directions, where +1 (or -1) denotes positive (or 
negative) labeled data and 0 denotes unlabeled data. 

The first and second terms of the above objective function 
are related to the pairwise constraint propagation over X, while 
the third and fourth terms are related to the pairwise constraint 
propagation over y. Moreover, the fifth term can ensure 
that the solutions of these two types of pairwise constraint 
propagation are as approximate as possible. Let F* x and Fy 
be the best solutions of pairwise constraint propagation over 
X and y , respectively. The best solution of our inter-view 
constraint propagation is defined as follows: 

F* = (F£ + F£)/ 2 . (3) 

As for the second and fourth terms, they are known as the 
energy functional na (or smoothness) defined over and 
y. In summary, we have formulated intere-view constraint 
propagation as minimizing a regularized energy functional. 

B. Efficient Algorithm 

Let Q(F x ,F y ) denote the objective function in equation 
©. The alternate optimization technique can be adopted to 
solve min F x ,F y Q{Fx,Fy) as follows: 1 ) Fix Fy = Fy, and 
find F£ = argminp* Q(Fx,Fy); 2) Fix Fx = F£, and find 
Fy = argmin Fy Q(F x ,Fy)- 

Pairwise Constraint Propagation over X: When Fy is fixed 
at the solution of minp^ Q(Fx,Fy) can be found by 
solving the following linear equation 

= (p x - Z )+ nxCxF x + 7 (F x - Fy) = 0, 

which can be equivalently transformed into: 

(/ + (i x C-x)F x = (1 - P)Z + PFy, (4) 

where fix = Hx/{ 1 + 7) and /3 = 7/(1+7). Since I + fi x C x 
is positive definite, we then obtain an analytical solution: 

F% = (I + ji x C x )- 1 {{l-P)Z + pF$). (5) 

However, this analytical solution is not efficient for large 
datasets, since matrix inverse has a time cost of 0 (N 3 ). 
Fortunately, equation 0 can also be efficiently found using 
label propagation 0 with fc-nearest neighbor (fc-NN) graph. 
Pairwise Constraint Propagation over y: When Fx is fixed 
at F* x , the solution of min^ Q(F£,F;y) can be found by 
solving the following linear equation 

d Q S Fy) = (Fy ~ Z ) + VyFyCy + y{F y - F* x ) = 0, 


which can be equivalently transformed into: 

Fy{I + fayCy) = (1 - p)Z + 0 F£, (6) 

where fay = py/( 1 +7) and = 7/ (1 +7). Since I + fayCy 

is positive definite, we then obtain an analytical solution: 

F$ = ((l-p)Z + t3F*x){I + (iyC y )-\ ( 7 ) 

which involves time-consuming matrix inverse. In fact, the 
linear equation ([6]) can also be efficiently solved using label 
propagation a with fc-NN graph. 

Let Wx (or Wy ) denote the weight matrix of the fc-NN 
graph constructed over A' (or y). The complete algorithm for 
inter-view constraint propagation is summarized as follows: 

(1) Compute two matrices Sx = F>f^^ 2 WxFf ^^ 2 and 
Sy = Dy 1 / 2 WyDy 1/2 , where Dx (or Dy) is a 
diagonal matrix with its i -th diagonal entry being 
the sum of the i-th row of Wx (or Wy); 

(2) Initialize F x ( 0 ) = 0, Fy = 0, and Fy( 0) = 0; 

( 3 ) Iterate F x (t + 1 ) = axSxFx{t ) + (1 - ax)((l - 
/ 3 )Z + fiFy) until convergence at F* x , where ax = 
fax /(l + fax) and f 3 = 7/(1 + 7); 

( 4 ) Iterate Fy(t + 1 ) = ayFy{t)Sy + (1 - ay)((l - 
/ 3 )Z + / 3 Fx) until convergence at Fy, where ay = 

fly / {l + fly\, 

( 5 ) Iterate Steps ( 3 )—( 4 ) until convergence, and output 
the final solution F* = (F£ + Fy)/ 2. 

According to the convergence analysis in 0 , Step ( 3 ) 
converges to F£ = (1 - a)(I - a x Sx)~ l {{l ~ P)Z + fiFy), 
equal to the solution © given that a x = fix/(l + fax) 
and Sx = I ~ Fx- Similarly, Step ( 4 ) converges to Fy = 
(\ — a)((l — fi)Z + PFx){I — QLySy) 1 , equal to the solution 
0 given that ay = fay/(l + fay) and Sy = / — Cy. In 
the experiments, we find that Steps ( 3 )—( 5 ) generally converge 
in very limited iterations (< 10 ). Moreover, based on fc-NN 
graphs, the above inter-view constraint propagation algorithm 
has a time cost of 0 (kNM), which is proportional to the num¬ 
ber of all possible inter-view pairwise constraints. Hence, we 
consider that this algorithm can provide an efficient solution 
to inter-view constraint propagation (note that even a simple 
assignment operator on F* incurs a time cost of O(NM)). 

III. Constrained Graph Construction 

In the last section, we have just developed an efficient inter¬ 
view constraint propagation algorithm based on the graph- 
based label propagation technique. However, since graph- 
based label propagation has been adopted as a basic optimiza¬ 
tion technique, there remains one problem to be concerned 
in inter-view constraint propagation, i.e., how to exploit intra¬ 
view pairwise constraints for graph construction within each 
view. In this section, we then develop two constrained graph 
construction methods for inter-view constraint propagation, 
which only differ in how the intra-view pairwise constraints 
are exploited. To ensure our inter-view constraint propagation 
algorithm runs efficiently even on large datasets, we utilize the 
traditional fc-NN graph construction as the basis of our con¬ 
strained graph construction, i.e., the obtained two constrained 
graphs can be considered as the variants of fc-NN graph. In the 
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following, we will only elaborate how to construct the graph 
Qx = {X,Wx} over X. The graph Q y = {y,Wy} over y 
can be constructed exactly in the same way. 


A. Constrained Weight Adjustment 


The first constrained graph construction method limits our 
inter-view constraint propagation proposed in Section |TI] to 
a single view (i.e. intra-view constraint propagation over X) 
and then utilize the obtained results of intra-view constraint 
propagation to adjust the weight matrix, which is thus called as 
constrained weight adjustment (CWA). According to the con¬ 
vergence analysis in Section III-Bi we construct a fc-NN graph 
over to speed up our intra-view constraint propagation. 

1) Intra-View Constraint Propagation: We have just pro¬ 
vided a sound solution to the challenging problem of intra¬ 
view constraint propagation in Section HU In this subsection, 
we further consider pairwise constraint propagation over a 
single view, where each pairwise constraint is defined over 
a pair of data points from the same view. In fact, this intra¬ 
view constraint propagation problem can also be solved from a 
semi-supervised learning viewpoint by limiting our inter-view 
constraint propagation to a single view. 

Given the dataset X = {xi,...,xat}, we denote the set of 
initial must-link constraints as Mx = {(xi,Xj) : k = lj} and 
the set of initial cannot-link constraints as Cx = {(xi,Xj) : 
li 7 ^ lj}, where k is the label of data point X{. Similar to our 
representation of the initial inter-view pairwise constraints, we 
first denote the initial intra-view pairwise constraints Mx and 
Cx with a single matrix Z* = {z\f }nxn'- 


(x ) 

Z-- = < 


+ 1 > 

-1, 

0 , 


(xi,Xj) e Mx ; 
{xi 1 Xj ) £ Cx ; 

otherwise. 


( 8 ) 


Furthermore, by making vertical and horizontal observations 
on we further decompose the intra-view constraint prop¬ 
agation problem into semi-supervised learning subproblems, 
just as our interpretation of inter-view constraint propagation 
from a semi-supervised learning viewpoint. These subprob¬ 
lems can be similarly merged to a single optimization problem 
(similar to I 20 l -ll 22 l): 


min \\F V - Z x \\) ro + C X F V ) + \\F h - Z x \\) ro 

x v ,r h 

+ftix(Fh£xFh) +^\\F V - F h \\ 2 fro , (9) 


where p > 0 (or 7 > 0 ) denotes the regularization parameter, 
and Cx denotes the normalized Laplacian matrix defined over 
the fc-NN graph. The second and fourth terms of the above 
equation denote the energy functional iflOl (or the smooth¬ 
ness measure) defined over A. In summary, we have also 
formulated intra-view constraint propagation as minimizing a 
regularized energy functional. 

Similar to what we have done for solving equation ©, 
we can adopt the alternate optimization technique to find the 
best solution to the above intra-view constraint propagation 
problem. Let Wx denote the weight matrix of the fc-NN graph 
constructed over the dataset A. The proposed algorithm for our 
intra-view constraint propagation is outlined as follows: 


( 1 ) Compute Sx = D~^WxD~i, where D is a diago¬ 
nal matrix with its entry (i,i) being the sum of row 
i of Wx; 

(2) Initialize F v ( 0) = 0, F^ = 0, and F h ( 0) = 0; 

(3) Iterate F v (t + 1) = aSxF v (t) + (1 — a)((l — f})Zx + 
ftFjj) until convergence at F*, where a = p/(l + 
/i + 7 ) and /3 = 7/(1 + 7 ); 

(4) Iterate F h {t+l) = aF h (t)Sx + (l-a)((l-(3)Zx + 
/3F'*) until convergence at F^; 

(5) Iterate Steps (3)—(4) until the stopping condition is 
satisfied, and obtain F* = (F* + F£)/ 2. 

( 6 ) Output the normalized solution F* = F*/F£ iax , 
where F^ ax denotes the maximum entry of F*. 

In the experiments, we find that Steps (3)—(5) generally 
converge in very limited iterations (< 10). Moreover, based 
on fc-NN graph, our algorithm has a time cost of 0(kN 2 ) 
proportional to the number of all possible pairwise constraints. 
Hence, it can be considered to provide an efficient solution. 

2) Weight Adjustment Using Propagated Constraints: It 
should be noted that the normalized output F* = {f*j} N xN of 
our intra-view constraint propagation represents an exhaustive 
set of intra-view pairwise constraints. Our original motivation 
is to construct a new graph over A that is fully consistent with 
F*. In fact, we can exploit F* for such graph construction 
by adjusting the original normalized weight matrix Wx (he. 
0 < wff < 1) just as ll2Ql : 


~(cc) 

w)- = 


1 - (1 - /«)(! - wff), 


f*j ) w •• 


13> 
(x) 
ij 5 


f*j > 0 ; 
/*■ < 0 . 


( 10 ) 


Since Wx = {F)[^}nxN is nonnegative and symmetric, we 
then use it as the new weight matrix. Moreover, we can find 
that (or < W^f) if F% > 0 (or < 0). That 


is, the new weight matrix Wx is derived from the original 
weight matrix Wx by increasing W^ for the must-link 

J (x) 

constraints with F-* > 0 and decreasing W>j ’ for the cannot- 
link constraints with F^* < 0. This is entirely consistent 
with our original motivation of exploiting intra-view pairwise 
constraints for graph construction. 

Once we have constructed the new weight matrix Wx over 
we can similarly construct the new weight matrix Wy 
over y. Based on these two new weight matrices, our inter¬ 
view constraint propagation can be performed with constrained 
graph construction (CGC) (as shown in Fig. [U using con¬ 
strained weight adjustment (CWA) developed here. 


B. Constrained Sparse Representation 

The second constrained graph construction method formu¬ 
lates graph construction as sparse representation ED-GS and 
then directly add the intra-view pairwise constraints into sparse 
representation, which is thus called as constrained sparse 
representation (CSR). Our work is mainly inspired by recent 
effort to exploit sparse representation for graph construction, 
i.e., L \-graph construction ESI, ED. The basic idea of L\- 
graph construction is to seek a sparse linear reconstruction 
of each data point with the other data points. However, such 
L \-graph construction may become infeasible since it incurs 
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too much time cost given a large data size N. Hence, we 
only consider the k nearest neighbors of each data point for 
its sparse linear reconstruction, which thus becomes a much 
smaller scale optimization problem (k <C N). More notably, 
due to such neighborhood limitation, the obtained L\ -graph 
is actually a variant of fc-NN graph, which can ensure that 
our inter-view constraint propagation proposed in Section [IJ 
runs efficiently on large datasets. Finally, to exploit intra¬ 
view pairwise constraints for L\ -graph construction, we seek 
a constrained sparse linear reconstruction of each data point. 

1) Li-Graph Construction with Sparse Representation: We 
start with the problem formulation for sparse linear reconstruc¬ 
tion of each data point in its k- nearest neighborhood. Given a 
data point Xi G X, we suppose it can be reconstructed using 
its /c-nearest neighbors (their indices are collected into J\fk(i)), 
which results in an underdetermined linear system: Xi = Bioli, 
where ai G R k is a vector that stores unknown reconstruction 
coefficients, and Bi = [xj]j e N k (i) is an overcomplete dictio¬ 
nary with k bases. According to ED, if the solution for Xi is 
sparse enough, it can be recovered by: 

min |lor*||i, s.t. x i = B i a. i , (11) 

OLi 

where ||a^||i is the Li-norm of a^. Given the kernel (affinity) 
matrix A = {a^j^x n computed over X, we make use of the 
kernel trick and transform the above problem into: 

min | \oli ||i, s.t. xi = C^i, (12) 

O-i 

where x, = [aji] je ^ k (i) G Rk > c i = G Rkxk ■ 

In practice, due to the noise in the data, we can reconstruct 
xi similar to ll24l : Xi = CiQLi + Cu where (i is the noise term. 
The above L\ -optimization problem can then be redefined by 
minimizing the Li-norm of both reconstruction coefficients 
and reconstruction error: 

min ||a4||i, s.t. X{ = C^af (13) 

a 'i 

where C[ = [Ci, I] G R kx2k and a[ = [af , (f] T - This convex 
optimization can be solved by general linear programming and 
has a globally optimal solution. 

After we have obtained the reconstruction coefficients for 
all the data points by the above sparse linear reconstruction, 
the weight matrix Wx = {Wij^NxN can be defined by: 

(*) = f KU% 3 e Vfc(i),/ = index(j, A4(i)); 
ij [0, otherwise, 1 ' 

where a'^j') denotes the j'-th element of the vector a[, and 
j' = index (j, A4(^)) means that j is the j'-th element of the 
set By setting the weight matrix Wx = (Wx+W#)/ 2, 

we construct a graph Qx — {A, Wx} over X, which is called 
as L\ -graph since it is constructed by L\ -optimization. 

2 ) Li-Norm Laplacian Regularization with Intra-View Pair¬ 
wise Constraints: In the above L\ -graph construction, we 
have ignored intra-view pairwise constraints (see examples in 
Fig. [I]). In fact, this supervisory information can be exploited 
for L\ -graph construction through Laplacian regularization 
0, ED- Our basic idea is to first derive Laplacian regulariza¬ 
tion from intra-view pairwise constraints and then incorporate 


this constrained term into sparse linear reconstruction (the key 
step of L\ -graph construction). In the following, we will first 
elaborate how to derive a new Laplacian regularization term 
from intra-view pairwise constraints. 

Given a set of intra-view must-link constraints Mx and a 
set of intra-view cannot-link constraints Cx defined over 
we can represent both Mx and Cx using a single matrix 
Zx = {z^}nxn exactly the same as equation ®. The 
normalized Laplacian matrix limited to the /^-nearest neigh¬ 
borhood of data point xi can thus be defined as: 

C i = I-D7 1/2 (l + Z i )D7 1 ' 2 , (15) 

where Z t = e R kxk , and D t is a diagonal 

matrix with its j- th diagonal element being the sum of the j-th 
row of 1 -\-Zi. Here, we define the similarity matrix (i.e. 1 -\-Zi) 
limited to the /c-nearest neighborhood Mk(j) of Xi based on 
the intra-view pairwise constraints stored in Zx . From this 
normalized Laplacian matrix Ci, we can derive the Laplacian 
regularization term for the sparse representation problem (fl2l ) 
as af CiOLi, the same as the original definition in @. 

However, we have difficulty in directly incorporating this 
Laplacian regularization term into the sparse representation 
problem (fl2l) . no matter as a part of the objective function 
or a constraint condition. Hence, we further formulate an Li- 
norm version of Laplacian regularization m , GSI-EB: 

\\Ciai\h = HsfvfaiHi, (16) 

1 

where C, = S? Vf, ^ is a k x k orthonormal matrix with 
each column being an eigenvector of Ci, and is a k x k 
diagonal matrix with its diagonal element E^(j, j) being an 
eigenvalue of Ci (sorted as E^( 1,1) < ... < E i(k,k)). Given 
that Ci is nonnegative definite, E$ > 0 (i.e. all the eigenvalues 
> 0). Since CM = V^E i and Vi is orthonormal, we have 
Ci = VCCiVA. Hence, the original Laplacian regularization 
af Ciai can be reformulated as: 

afCiOt = afViAzfVZai = \\C iai |||, (17) 

which means that our new formulation ||Q^||i can indeed 
be regarded as an Li-norm version of the original Laplacian 
regularization af C^i = 11 C^i\ \ 

3) Li-Graph Construction with L\-Norm Laplacian Reg¬ 
ularization: After we have formulated Li-norm Laplacian 
regularization based on intra-view pairwise constraints, we 
can further incorporate this constrained term into sparse 
linear reconstruction used for Li -graph construction. More 
concretely, by introducing noise terms for linear reconstruction 
and Li-norm Laplacian regularization, we transform the sparse 
representation problem (fl2l) into 


min [ai,C ,£ ] i, 

OtiXi^i 

s.t. Xi = Ciai (^i, 0 = Ciai -f 


(IB) 


where the reconstruction error and Laplacian regularization 
with respect to are controlled by (i and ^respectively. 

Let a' = [aj,CLUJ?, C[ = 


Ci 

Ci 


I 

0 


and x'j = 
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[xf,0 T ] T . We finally solve the following constrained spare 
representation problem for L \-graph construction: 

min ||aj||i, s.t. x , i = C[a[, (19) 

which takes the same form as the original spare representation 
problem Here, it is noteworthy that this constrained spare 
representation (CSR) problem can be solved very efficiently, 
since it is limited to ^-nearest neighborhood. The weight 
matrix Wx of the L \-graph Qx = {X,Wx} can be defined 
the same as equation d. 

In our CSR formulation, the Fi-norm Laplacian regular¬ 
ization can be smoothly incorporated into the original sparse 
representation problem (fl2l) . However, this is not true for 
the traditional Laplacian regularization a, ieb. which may 
introduce extra parameters (hard to tune in practice) into 
the L \-optimization for sparse representation. Meanwhile, our 
Li-norm Laplacian regularization can induce another type 
of sparsity (see the extra noise term &), which can not be 
ensured by the traditional Laplacian regularization. Moreover, 
the p-Laplacian regularization ED can also be regarded as 
an ordinary L \-generalization of the Laplacian regularization 
when p = 1. According to [f32l . by defining a matrix C p G 

fc(fc-i) , 

R 2 the p-Laplacian regularization can be formulated 
as 11C'pO'i||i, similar to our Fi-norm Laplacian regularization. 
Hence, we can similarly apply the p-Laplacian regularization 
with p = 1 to constrained spare representation. However, such 
Laplacian regularization incurs large time cost due to the large 
matrix C p even for small neighborhood size (e.g. k = 90). 

Once we have constructed the L \-graph Qx = {A, Wxj 
over A', we can similarly construct the Li -graph Qy = 
{^y, Wy} over y. Based on the two weight matrices, our inter¬ 
view constraint propagation can be performed with constrained 
graph construction (CGC) (as shown in Fig. [U using con¬ 
strained sparse representation (CSR) developed here. 

IV. Application to Cross-View Retrieval 

When multiple views refer to text, image, audio and so 
on (see Fig. O, the output of our inter-view constraint prop¬ 
agation actually can be viewed as the correlation between 
different media views. As we have mentioned, given the output 
F* = {ftj}NxM of our inter-view constraint propagation, 
(xi,yj) denotes a must-link (or cannot-link) constraint if 
f*j > 0 (or < 0). Considering the inherent meanings of 
must-link and cannot-link constraints, we can state that: Xi 
and yj are “positively correlated” if /•* > 0, while they are 
“negatively correlated” if f*j < 0. Hence, we can view f*j 
as the correlation coefficient between Xi and yj. The distinct 
advantage of such interpretation of F* as a correlation measure 
is that F* can thus be used for ranking on y given a query 
xi or ranking on X given a query yj. In fact, this is just the 
goal of cross-view retrieval which has drawn much attention 
recently ll5l . That is, such task can be directly handled by 
our inter-view constraint propagation. 

In this paper, we focus on a special case of cross-view 
retrieval, i.e. only text and image views are considered. In this 
case, cross-view retrieval is somewhat similar to automatic 
image annotation |[33l - |[36l and image caption generation 


Text query 

The watershed lies partly in the Coast 
Range ecoregion and partly in the 
Willamette Valley ecoregion designated by 
the U.S. Environmental Protection Agency 
(EPA). Reverse side The historic lower 
Balch Creek watershed through the 1880s 
was a mixture of open water, wetlands, 
grassland, and forest, while above the flood 
plain the watershed consisted of closed 
canopy forest. European Americans. 

Three Puerto Ricans were awarded 
Distinguished Service Cross, they were 
PFC. Luis F. Castro, Private Anibal 
Irrizarry and PFC Joseph R. Martinez. PFC 
Joseph (Jose) R. Martinez bom in San 
German, Puerto Rico destroyed a German 
Infantry unit and tank in Tunis by providing 
heavy artillery fire, saving his platoon from 
being attacked in the process. He received 
the Distinguished Service Cross. 


Short and stocky, Hill was a gifted batsman 
who could score quickly when required. 

"Wisden" described Hill as a "specially 
brilliant batsman on hard pitches". He had 
an awkward crouched stance, gripping the 
bat low on the handle. This limited his 
forward reach and power and reduced his 
effectiveness when driving but he 
compensated for this with quick footwork. 

Hill's strong bottom hand and his keen. 

Fig. 3. Cross-view retrieval examples on the Wikipedia benchmark dataset 
115]. Here, the incorrectly retrieved images are marked with red boxes. 

E2-E9), since these three tasks all aim to learn the relations 
between the text and image views. However, even if only text 
and image views are considered, cross-view retrieval is still 
quite different from automatic image annotation and image 
caption generation. More concretely, automatic image annota¬ 
tion relies on very limited types of textual representations and 
mainly associates images only with textual keywords, while 
cross-view retrieval is designed to deal with much more richly 
annotated data, motivated by the ongoing explosion of Web- 
based content such as news archives and Wikipedia pages. 
Similar to cross-view retrieval, image caption generation can 
also deal with more richly annotated data (i.e. captions) with 
respect to the textual keywords concerned in automatic image 
annotation. However, this task tends to model image captions 
as sentences by exploiting certain prior knowledge (e.g. the 
<object, action, scene> triplets used in [37]), different from 
cross-view retrieval that focuses on associating images with 
complete text articles using no prior knowledge from the 
text view (any general textual representations are applicable 
actually once their similarities are provided). 

In the context of cross-view retrieval, one notable recent 
work is m which first learns the correlation between the text 
and image views with canonical correlation analysis (CCA) 
m and then achieves the abstraction by representing text 
and image at a more general semantic level. However, two 
separate steps, i.e. correlation analysis (CA) and semantic 
abstraction (SA), are involved in this modeling, and the use of 
semantic abstraction after CCA (i.e. CA+SA) seems rather ad 
hoc. Fortunately, this problem can be completely addressed by 
our inter-view constraint propagation (Inter-CP). The semantic 
information (e.g. class labels) associated with images and text 
can be used to define the initial must-link and cannot-link 
constraints based on the training dataset, while the correlation 
between text and image views can be explicitly learnt by 
the proposed algorithm in Section [III That is, the correlation 
analysis and semantic abstraction has been successfully in- 


Retrieved images by cross-view retrieval 
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ay =0.025, p = 0.95, k = 90 


a x = 0.025, P = 0.95, k = 90 


ax = 0.025, ay = 0.025, k = 90 


a x = 0.025, ay = 0.025, /3 = 0.95 




Fig. 4. The cross-view retrieval results by cross-validation on the training set of the Wikipedia dataset for our Inter-CP algorithm (CSR is used here). 


tegrated in our inter-view constraint propagation framework. 
The effectiveness of such integration as compared to CA+SA 
fBl is preliminarily verified by several cross-view retrieval 
examples shown in Fig. [3j Further verification will be provided 
in our later experiments. More notably, although only tested 
in cross-view retrieval, our inter-view constraint propagation 
can be readily extended to other multi-view tasks, since it has 
actually learnt the correlation between different views. 

V. Experimental Results 

In this section, our inter-view constraint propagation (Inter- 
CP) algorithm is evaluated in the challenging application of 
cross-view retrieval. We focus on comparing our Inter-CP 
algorithm with the state-of-the-art approach fBl . since they 
both consider not only correlation analysis (CA) but also 
semantic abstraction (SA) for text and image views. Moreover, 
we also make comparison with another two closely related 
approaches that integrate CA and SA for cross-view retrieval 
similar to iFBI but perform correlation analysis by partial least 
squares (PLS) ATI and cross-modal factor analysis (CFA) 
ll42l instead of CCA, respectively. In the following, these 
two CA+SA approaches are denoted as CA+SA (PLS) and 
CA+SA (CFA), while the state-of-the-art approach fBl is 
denoted as CA+SA (CCA). Finally, to show the effectiveness 
of constrained graph construction, we construct four types of 
graphs for our Inter-CP algorithm: fc-NN graph (fc-NN), L\- 
graph using sparse representation (SR), fc-NN graph using 
constrained weight adjustment (CWA), and L \-graph using 
constrained sparse representation (CSR). 

A. Experimental Setup 

We select two different datasets for performance evaluation. 
The first one is a Wikipedia benchmark dataset fBl . which 
contains a total of 2,866 documents derived from Wikipedia’s 
“featured articles”. Each document is actually a text-image 
pair, annotated with a label from the vocabulary of 10 semantic 
classes. This benchmark dataset El is split into a training set 
of 2,173 documents and a test set of 693 documents. More¬ 
over, the second dataset consists of totally 8,564 documents 
crawled from the photo sharing website Flickr. The image 
and text views of each document denote a photo and a set 
of tags provided by the users, respectively. Although such 
text presentation does not take a free form as that for the 
Wikipedia dataset, it is rather noisy since many of the tags 
may be incorrectly annotated by the users. This Flickr dataset 
is organized into 11 semantic classes. We split it into a training 
set of 4,282 documents and a test set of the same size. 


For the above two datasets, we take the same strategy as 
m to generate both text and image representation. More 
concretely, in the Wikipedia dataset, the text representation 
for each document is derived from a latent Dirichlet allocation 
model with 10 latent topics, while the image representation is 
based on a bag-of-words model with 128 visual words learnt 
from the extracted SIFT descriptors, just as fBl . Moreover, 
for the Flickr dataset, we generate the text and image repre¬ 
sentation similarly, and the main difference is that we select a 
relatively large visual vocabulary (of the size 2,000) for image 
representation and refine the noisy textual vocabulary to the 
size 1,000 by a preprocessing step for text representation. 

In our experiments, the intra-view pairwise constraints used 
for our CGC and inter-view pairwise constraints used for 
our Inter-CP are initially derived from the class labels of the 
training documents of each dataset. The performance of our 
Inter-CP with CGC is evaluated on the test set. Here, two tasks 
of cross-view retrieval are considered: text retrieval using an 
image query, and image retrieval using a text query. In the 
following, these two tasks are denoted as “Image Query” and 
“Text Query”, respectively. For each task, the retrieval results 
are measured with mean average precision (MAP) which has 
been widely used in the image retrieval literature tm. 

Let A denote the text representation and y denote the image 
representation. For our Inter-CP algorithm, we perform CGC 
over A' and y with the same fc. The parameters of our Inter- 
CP algorithm with CGC can be selected by fivefold cross- 
validation on the training set. For example, according to Fig.0 
we set the parameters of our Inter-CP (CSR is used for CGC) 
on the Wikipedia dataset as: ax = 0.025, ay = 0.025, 
/3 = 0.95, and fc = 90. It is noteworthy that our Inter-CP 
with CSR is not sensitive to these parameters. Moreover, the 
parameters of our Inter-CP with CWA can be similarly set 
to their respective optimal values. To summarize, we have 
selected the best values for all the parameters of our UCP 
algorithm with CGC by cross-validation on the training set. 
For fair comparison, we take the same parameter selection 
strategy for other closely related algorithms. 

B. Retrieval Results 

The cross-view retrieval results on the two datasets are listed 
in Tables U and III respectively. The immediate observation 
is that we can achieve the best results when both intra-view 
and inter-view pairwise constraints are exploited by Inter- 
CP+CWA (or Inter-CP+CSR). This means that our Inter-CP 
with CGC can most effectively exploit the initial supervisory 
information provided for cross-view retrieval. As compared 









































TABLE I 

The cross-view retrieval results on the test set of the 
Wikipedia dataset measured by the MAP scores. 


Methods 

Image Query 

Text Query 

Average 

CA+SA (PLS) 

0.250 

0.190 

0.220 

CA+SA (CFA) 

0.272 

0.221 

0.247 

CA+SA (CCA) 

0.277 

0.226 

0.252 

Inter-CP+/c-NN 

0.329 

0.256 

0.293 

Inter-CP+SR 

0.336 

0.259 

0.298 

Inter-CP+CWA 

0.337 

0.260 

0.299 

Inter-CP+CSR 

0.343 

0.268 

0.306 


TABLE II 

The cross-view retrieval results on the test set of the Flickr 

DATASET MEASURED BY THE MAP SCORES. 


Methods 

Image Query 

Text Query 

Average 

CA+SA (PLS) 

0.201 

0.168 

0.185 

CA+SA (CFA) 

0.252 

0.231 

0.242 

CA+SA (CCA) 

0.280 

0.263 

0.272 

Inter-CP+/c-NN 

0.495 

0.483 

0.489 

Inter-CP+SR 

0.509 

0.496 

0.503 

Inter-CP+CWA 

0.521 

0.499 

0.510 

Inter-CP+CSR 

0.521 

0.505 

0.513 


to the three CA+SA approaches by semantic abstraction after 
correlation analysis (via PLS, CFA, or CCA), our Inter-CP 
can seamlessly integrate these two separate steps and then 
lead to much better results. Moreover, the effectiveness of our 
CGC is verified by the comparison Inter-CP+CWA vs. Inter- 
CP+/c-NN (or Inter-CP+CSR vs. Inter-CP+SR), especially on 
the Flickr dataset. As for our two CGC methods, CSR is shown 
to perform better than CWA, which is mainly due to the noise- 
robustness property of sparse representation. 

It should be noted that our Inter-CP algorithm can be 
considered to provide an efficient solution, since it has a 
time cost proportional to the number of all possible pairwise 
constraints. This is also verified by our observations in the 
experiments. For example, the running time taken by CA+SA 
(CCA, CFA or PLS), Inter-CP+/c-NN, and Inter-CP+CWA on 
the Wikipedia dataset is 10, 24, and 55 seconds, respectively. 
Here, we run all the algorithms (Matlab code) on a computer 
with 3GHz CPU and 32GB RAM. Since our Inter-CP with 
CGC leads to significantly better results, we prefer it to 
CA+SA in practice, regardless of its relatively larger time cost. 

VI. Conclusions 

In this paper, we have investigated the challenging problem 
of pairwise constraint propagation on multi-view data. By 
decomposing the inter-view constraint propagation problem 
into a set of independent semi-supervised learning subprob¬ 
lems, we have uniformly formulated them as minimizing a 
regularized energy functional. More importantly, these semi- 
supervised learning subproblems can be solved efficiently 
using label propagation with k -NN graph. We then develop 
two constrained graph construction methods for our inter-view 
constraint propagation, and the obtained two graphs can be 
considered as the variants of k -NN graph. The experimen¬ 
tal results in cross-view retrieval have shown the promising 
performance of our inter-view constraint propagation with 
constrained graph construction. For future work, our method 
will be extended to other multi-view tasks. 
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